Hourly wind generation

Looks like the column headings changed from year to year. We'll have to fix this.


In [50]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

In [6]:
fn2009 = 'rpt.00013424.0000000000000000.20141016.182537070.ERCOT_2009_Hourly_Wind_Output.xls'
fn2015 = 'rpt.00013424.0000000000000000.ERCOT_2015_Hourly_Wind_Output.xlsx'
df_2009 = pd.read_excel(fn2009, index_col=0, sn='2009')
df_2015 = pd.read_excel(fn2015, index_col=0, sn='2015')

In [8]:
df_2009.head()


Out[8]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change Wind Daily Change on Hour % of Daily Wind Change
Time
2009-01-01 00:00:00 28505.482422 8111 3013.196284 37.149504 10.570585 NaN NaN NaN NaN
2009-01-01 01:00:00 27958.261719 8111 3438.198194 42.389326 12.297611 425.001911 14.104687 NaN NaN
2009-01-01 02:00:00 27659.359375 8111 3743.784748 46.156883 13.535327 305.586554 8.887985 NaN NaN
2009-01-01 03:00:00 27448.425781 8111 3599.891448 44.382831 13.115111 -143.893300 -3.843525 NaN NaN
2009-01-01 04:00:00 27563.148438 8111 3887.457934 47.928220 14.103824 287.566486 7.988199 NaN NaN

In [24]:
df_2009['hour'] = df_2009.index.hour
df_2009['year'] = df_2009.index.year

In [15]:
df_2015.head()


Out[15]:
Date ERCOT Load, MW Total Wind Output, MW Total Wind Installed, MW Wind Output, % of Load Wind Output, % of Installed 1-hr MW change 1-hr % change
time-date stamp
2015-01-01 00:00:00 Jan-01 39932.14844 870.645874 12730 2.180313 6.839323 NaN NaN
2015-01-01 01:00:00 Jan-01 39134.48828 723.807434 12730 1.849539 5.685840 -146.838440 -16.865461
2015-01-01 02:00:00 Jan-01 38560.46875 596.361267 12730 1.546561 4.684692 -127.446167 -17.607745
2015-01-01 03:00:00 Jan-01 38334.40234 486.028870 12730 1.267866 3.817980 -110.332397 -18.500933
2015-01-01 04:00:00 Jan-01 38392.07031 650.547791 12730 1.694485 5.110352 164.518921 33.849619

In [25]:
df_2015['hour'] = df_2015.index.hour
df_2015['year'] = df_2015.index.year

In [ ]:
df =

Wind capacity factor

The figures below show average CF of installed wind in 2009 and 2015. Looks like they got ~5 percentage points up over that time.


In [17]:
sns.factorplot('hour', '% Installed Wind Capacity', data=df_2009, aspect=1.5)
plt.title('2009 Wind Capacity by Hour of Day')


Out[17]:
<matplotlib.text.Text at 0x115a0c590>

In [19]:
sns.factorplot('hour', 'Wind Output, % of Installed', data=df_2015, aspect=1.5)
plt.title('2015 Wind Capacity by Hour of Day')


Out[19]:
<matplotlib.text.Text at 0x1152db450>

Wind as % of total ERCOT load

The figures below show average values of wind as a percent of total ERCOT load by hour in 2009 and 2015. This is a much bigger difference - 2015 values are about double 2009 values.


In [22]:
sns.factorplot('hour', 'Wind % of ERCOT Load', data=df_2009, aspect=1.5)
plt.title('2009 Wind as fraction of total load')


Out[22]:
<matplotlib.text.Text at 0x115d879d0>

In [21]:
sns.factorplot('hour', 'Wind Output, % of Load', data=df_2015, aspect=1.5)
plt.title('2015 Wind as fraction of total load')


Out[21]:
<matplotlib.text.Text at 0x115edda10>

Try reading in all .xls and .xlsx files


In [1]:
import glob

In [38]:
files = glob.glob('*.xls')
files.extend(glob.glob('*.xlsx'))

# df = pd.DataFrame()
# for fn in files:
#     temp_df = pd.read_excel(fn, sn='numbers', index_col=0)
#     if temp_df.index.values[0] != pd.Timestamp:
#         print fn
#     try:
        
#     df = pd.concat[]

df = pd.concat([pd.read_excel(fn, sn='numbers', index_col=0) for fn in files])

In [52]:
df.sort_index(inplace=True)

In [53]:
df.head()


Out[53]:
% of Daily Wind Change % of Daily Wind Change on Hour 1-hr % change 1-hr MW change Date ERCOT Load, MW Hour Month Total Wind Installed, MW Total Wind Output, MW Wind Daily Change on Hour Wind Output, % of Installed Wind Output, % of Load
2007-01-01 00:00:00 NaN NaN NaN NaN 1 30428.0 0.0 1.0 2790.0 1074.0 NaN 38.494624 3.529644
2007-01-01 01:00:00 NaN NaN -14.096834 -151.4 1 30133.0 1.0 1.0 2790.0 922.6 NaN 33.068100 3.061760
2007-01-01 02:00:00 NaN NaN -7.955777 -73.4 1 29941.0 2.0 1.0 2790.0 849.2 NaN 30.437276 2.836245
2007-01-01 03:00:00 NaN NaN 24.387659 207.1 1 29949.0 3.0 1.0 2790.0 1056.3 NaN 37.860215 3.526996
2007-01-01 04:00:00 NaN NaN -20.751680 -219.2 1 30248.0 4.0 1.0 2790.0 837.1 NaN 30.003584 2.767456

In [54]:
df.tail()


Out[54]:
% of Daily Wind Change % of Daily Wind Change on Hour 1-hr % change 1-hr MW change Date ERCOT Load, MW Hour Month Total Wind Installed, MW Total Wind Output, MW Wind Daily Change on Hour Wind Output, % of Installed Wind Output, % of Load
2015-12-31 21:00:00 NaN NaN 7.179170 332.082031 Dec-31 37587.70313 NaN NaN 16170.0 4957.714844 NaN 30.659956 13.189725
2015-12-31 22:00:00 NaN NaN -5.216460 -258.617188 Dec-31 36356.26172 NaN NaN 16170.0 4699.097656 NaN 29.060592 12.925140
2015-12-31 23:00:00 NaN NaN -8.213761 -385.972656 Dec-31 35150.33984 NaN NaN 16170.0 4313.125000 NaN 26.673624 12.270507
2016-01-01 00:00:00 NaN NaN 1.207705 52.089844 Jan-01 34273.01953 NaN NaN 16170.0 4365.214844 NaN 26.995763 12.736593
NaT NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

In [55]:
df = df.iloc[:-1,:]

In [56]:
df.tail()


Out[56]:
% of Daily Wind Change % of Daily Wind Change on Hour 1-hr % change 1-hr MW change Date ERCOT Load, MW Hour Month Total Wind Installed, MW Total Wind Output, MW Wind Daily Change on Hour Wind Output, % of Installed Wind Output, % of Load
2015-12-31 20:00:00 NaN NaN 20.933715 800.700440 Dec-31 38736.85938 NaN NaN 16170.0 4625.632813 NaN 28.606264 11.941166
2015-12-31 21:00:00 NaN NaN 7.179170 332.082031 Dec-31 37587.70313 NaN NaN 16170.0 4957.714844 NaN 30.659956 13.189725
2015-12-31 22:00:00 NaN NaN -5.216460 -258.617188 Dec-31 36356.26172 NaN NaN 16170.0 4699.097656 NaN 29.060592 12.925140
2015-12-31 23:00:00 NaN NaN -8.213761 -385.972656 Dec-31 35150.33984 NaN NaN 16170.0 4313.125000 NaN 26.673624 12.270507
2016-01-01 00:00:00 NaN NaN 1.207705 52.089844 Jan-01 34273.01953 NaN NaN 16170.0 4365.214844 NaN 26.995763 12.736593

In [15]:
cols = df_2009.columns[:-2]
cols


Out[15]:
Index([u'ERCOT Load, MW', u'Total Wind Installed, MW',
       u'Total Wind Output, MW', u'Wind Output, % of Installed',
       u'Wind Output, % of Load', u'1-hr MW change', u'1-hr % change'],
      dtype='object')

In [57]:
df_final = df.loc[:,cols]
df_final.head()


Out[57]:
ERCOT Load, MW Total Wind Installed, MW Total Wind Output, MW Wind Output, % of Installed Wind Output, % of Load 1-hr MW change 1-hr % change
2007-01-01 00:00:00 30428.0 2790.0 1074.0 38.494624 3.529644 NaN NaN
2007-01-01 01:00:00 30133.0 2790.0 922.6 33.068100 3.061760 -151.4 -14.096834
2007-01-01 02:00:00 29941.0 2790.0 849.2 30.437276 2.836245 -73.4 -7.955777
2007-01-01 03:00:00 29949.0 2790.0 1056.3 37.860215 3.526996 207.1 24.387659
2007-01-01 04:00:00 30248.0 2790.0 837.1 30.003584 2.767456 -219.2 -20.751680

In [58]:
df_final.dtypes


Out[58]:
ERCOT Load, MW                 float64
Total Wind Installed, MW       float64
Total Wind Output, MW          float64
Wind Output, % of Installed    float64
Wind Output, % of Load         float64
1-hr MW change                 float64
1-hr % change                  float64
dtype: object

In [59]:
df_final.plot(y='Total Wind Installed, MW', use_index=True)


Out[59]:
<matplotlib.axes._subplots.AxesSubplot at 0x11a7332d0>

In [80]:
sns.jointplot('ERCOT Load, MW', 'Total Wind Output, MW', data=df_final, kind='hex')


Out[80]:
<seaborn.axisgrid.JointGrid at 0x1177058d0>

Export the clean ERCOT wind and load data


In [60]:
import os

In [77]:
filename = 'ERCOT wind data.csv'
path = '../../../Clean Data'
fullpath = os.path.join(path, filename)

In [79]:
df_final.to_csv(fullpath)

In [ ]: